DEVEA: an interactive shiny application for Differential Expression analysis, data Visualization and Enrichment Analysis of transcriptomics data

We are at a time of considerable growth in transcriptomics studies and subsequent in silico analysis. RNA sequencing (RNA-Seq) is the most widely used approach to analyse the transcriptome and is integrated in many studies. The processing of transcriptomic data typically requires a noteworthy number of steps, statistical knowledge, and coding skills, which are not accessible to all scientists. Despite the development of a plethora of software applications over the past few years to address this concern, there is still room for improvement. Here we present DEVEA, an R shiny application tool developed to perform differential expression analysis, data visualization and enrichment pathway analysis mainly from transcriptomics data, but also from simpler gene lists with or without statistical values. The intuitive and easy-to-manipulate interface facilitates gene expression exploration through numerous interactive figures and tables, and statistical comparisons of expression profile levels between groups. Further meta-analysis such as enrichment analysis is also possible, without the need for prior bioinformatics expertise. DEVEA performs a comprehensive analysis from multiple and flexible data sources representing distinct analytical steps. Consequently, it produces dynamic graphs and tables, to explore the expression levels and statistical results from differential expression analysis. Moreover, it generates a comprehensive pathway analysis to extend biological insights. Finally, a complete and customizable HTML report can be extracted to enable the scientists to explore results beyond the application. DEVEA is freely accessible at https://shiny.imib.es/devea/ and the source code is available on our GitHub repository https://github.com/MiriamRiquelmeP/DEVEA.


Introduction
RNA sequencing (RNA-seq) has become a routine and popular technique for genome-wide and transcriptomics expression analysis. 1 As a result, techniques analyzing RNA are extensively incorporated in basic science research and are even increasingly used as molecular diagnostics for human health. These may include diagnosis, prognosis and therapeutic selection. 2 However, in order to leverage the full power of this technique, several stages and tools are necessary to translate expression profiles into valuable outcomes. The R statistical environment 3 provides many well-known packages to perform key steps of a complete RNA-seq analysis pipeline. Possible examples include differential expression analysis (DEA) functions, leading to lists of differentially expressed genes (DEGs), and annotation enrichment analysis (EA, sometimes called pathway analysis) libraries, which will identify biological pathways or cellular functions significantly enriched from the list of DEGs.
Nevertheless, most of these powerful packages are command-line based or demand coding knowledge and are therefore out of reach for scientists with limited computational training. Besides, analyses can be started at different points in the workflow, from raw or partially analyzed data from different tools, to individual lists of favorite final features. Thus, tools providing several ways to start an analysis are more flexible than others using a single data type. Providing flexible user-friendly tools for the analysis and visualization of gene expression data can help researchers to move from highthroughput genomics to basic scientific research. To bridge this gap, an increasing number of software tools are being released, based on intuitive, point-and-click, graphical interfaces. Frameworks such as R Shiny, 4 an R package, facilitate the creation and release of interactive web tools. Certain RNA-seq analysis applications from the literature may include iDEP, 5 GENAVi 6 and ideal, 7 among others.
However, there are still ways to improve the functionalities of these tools. For instance, supporting input data types of different levels of complexity can extend the level of performance of the tool. It is also essential to include widely used types of analyses such as differential expression and pathway analysis, with enough options for calculations and graphical representation to generate valuable results. At last, user-friendliness is a particularly important point since these tools aim at helping the non-specialist. Therefore providing a robust and easily accessible web interface is an essential asset. With these considerations in mind, we have developed DEVEA, a new interactive R Shiny application for DEA, data exploration, data visualization and functional EA. DEVEA provides an easy-to-use interface to load data in various formats and complexities according to the stage of the analysis, including raw RNA-seq count data, pre-analyzed data, simple lists of genes or proteins obtained from different sources, with or without statistical values associated. From these different types of input data, it generates a wide set of dynamic plots and tables allowing quick navigation through gene expression profile or enrichment analysis results. The outputs can be downloaded easily and the user can create custom and operable reports in HTML format. DEVEA is implemented as a publicly available web server and can be optionally downloaded to be used locally. DEVEA aims to conduct a proper analysis by reaching out to both life scientists (gathering the biological expertise) and bioinformaticians (offering the technical expertise), and to foster communication between the two sides to promote easier and more extensive analysis of data.

Methods
Operation DEVEA was built as a Shiny application 4 in R 3 (V.4.1.1). Shiny is a package that facilitates the development of web applications from R. It is particularly indicated for building interactive and user-friendly software wrappers.
The tool is hosted on a remote, freely accessible web server (http://shiny.imib.es/devea). Apart from DEVEA's public web server, the application can be used on a local computer (see the supplementary material for a detailed procedure here https://github.com/MiriamRiquelmeP/DEVEA/blob/main/Supplemental-Information.md). Its source code is available on GitHub (https://github.com/MiriamRiquelmeP/DEVEA), under the terms of the Apache license 2.0. DEVEA has been tested in Linux and Windows 10 operating systems locally, and has also been launched remotely with different browsers (Google Chrome, Mozilla Firefox and Internet Explorer). However, for the best user experience in terms of rendering and visualization, we recommend to use Mozilla Firefox. Other browsers may present display issues when deploying some of the elements of the application and generate errors. Locally running the application shares all the same characteristics as the Shiny web application. A comprehensive guide on how to use the application from the different input modes can also be explored through the accessible tutorial from both DEVEA modules (DESeq DEVEA and Simple DEVEA) in the 'Tutorial' section from the top controls and independently on https://shiny.imib.es/DESeqDevea/tutorial.html or https:// shiny.imib.es/simpleDevea/tutorial.html.
DEVEA relies on several existing R packages to carry out all the functionalities proposed (see supplementary material for the complete list). For instance, in order to handle the calculation of DEGs, the analysis is largely based on DESeq2 package. 8 The annotation is managed by the AnnotationDbi package, 9 collecting all dedicated annotation databases for the different species (currently Homo sapiens, Mus musculus, Rattus norvegicus and Arabidopsis thaliana) for robust name conversion. For the enrichment calculations and visuals, the R packages topGO, 10 fgsea (Fast Gene Set Enrichment Analysis) 11 and clusterProfiler 12 together with other basic dependencies, such as ggplot2 13 and plotly 14 have been used. All those packages are widely used by the community and well maintained. We have selected DESeq2 for differential expression analysis as it is one of the most popular and well-tested tools for this task. Furthermore, it was shown 15 that DESeq2 has a good overall performance regardless of presence of outliers and proportion of DE genes compared with other methods, such as Limma-Voom, 16 BaySeq 17 and edgeR 18 among others.
The full DEVEA global workflow is shown in Figure 1. The main analysis path can be launched at different steps depending on the four input modes (check Data requirement section for details), represented at the top of the figure. The dashed arrows indicate where every input is incorporated in the pipeline, until the end of the workflow. Objects that are more complex will generate more results. For each step of the analysis, intermediate results are available as tables and graphical representations in their dedicated spaces, detailed below in the circular notes. The vast majority of tables and plots are interactive, allowing the user to visualize data in real-time as well as to interact efficiently and can be individually downloaded. In the end, a global report can be generated and annotated. Each of these steps of the complete analysis will be described in their corresponding sections in the next paragraphs.

Data requirement
The tool has four main data input modes ( Figure 2): • CM + SI mode: refers to a counting matrix (CM), containing the raw number of counts per gene as round digits, where columns correspond to samples and rows to features. Only un-normalized raw counts are accepted as input data, as obtained from the counting process. Data from other sources containing decimals (e.g. RSEM) has to be rounded before being uploaded in DEVEA. The CM should be associated with another file gathering the sample information (SI), as a data frame containing metadata about each sample, with the first column the identifier used by default as a label in visualizations. This can be modified afterward during the analysis. It should include any other relevant experimental factor (e.g. treatment/control, sex, cell type, tissue, etc). The design of the comparison will be determined by one of these factors. The column names in the CM and the first row names in the SI must be identical, and the gene IDs in this file can be included in Symbol or ENSEMBL format. Both files can be in .CSV, .TXT or .XLSX format.
• DO mode: based on a DeseqDataSet object (DO) generated by the DESeq() function from DESeq2 package. It is an object used to store the input values, intermediate calculations and results from a DEA. The user must have created it with the CountData field as the data matrix of counts, the ColData field with the sample information and a design formula specifying the experimental level to test for DEA. The first column in the CountData and the first row in the ColData are equal. The gene names can be included as gene Symbols or ENSEMBL gene IDs. The object must be compressed and extracted from R in .RDS format. If the differential expression object has been generated with a different tool or package, you may use the DEFormats 19 R function for a possible conversion.
• GL + SV mode: a gene list (GL) with associated statistical values (SV) per gene. The first column should contain gene names (in Symbol or ENSEMBL format), the second column the fold-change and the third column the statistical adjusted p-value, in this precise order. Column names should be provided without special characters. The numerical values will be used without further modifications by DEVEA to set the threshold of expression change and the significance. This table can be uploaded in .CSV, .TXT or .XLSX file format.
• GL mode: a Gene list (GL) consisting on a single column file in .CSV, .TXT or .XLSX format containing the gene names (in Symbol or ENSEMBL) and including a column name without special characters (i.e. GeneName, Genes, ID, etc). The gene list can be copied and pasted directly into the dedicated field in DEVEA, without the column name.
While the main data type for DEVEA's usage is RNA-seq data, it is worth noting that the simple gene list (GL) can be built from any other type of "omics" datasets, as long as the identifiers are recognized by DEVEA. Another example could be the use of GL + SV mode with treated data from microarray analysis, where values such as FC of expression between groups and adjusted p-value are available from the signal intensity of the probes. It is highly recommended to work with log2FC and adjusted p-values. The more elaborated input data types, such as the counting matrix (CM + SI), can be built in some cases from different data where they can safely be processed by DESeq2 functions. An example may be mass spectrometry data, the method of choice for quantitative. With label-free proteomics, it is possible to quantify proteins by using their spectral counts as an approximation of protein abundance, and then use statistical models such as DESeq2 even if they are designed specifically for count data. A study comparing different statistical methods for differential expression detection in label-free mass spectrometry proteomics shows that DESeq2 performed well both in terms of detection of true positives as well as controlling for the number of false negatives. 20 Therefore, it is perfectly possible to use this type of data in DEVEA, as long as the values represent unique measurements as integer numbers, and the protein IDs are replaced by their coding gene name.

Getting started
To start working with DEVEA, the adequate module to perform the analysis has to be chosen from DEVEA's main lobby interface. The decision depends on the input data format. The user has to choose 'Go DESeq DEVEA' if their input is a CM + SI or a DO, and the 'Go Simple DEVEA' mode in the case of a GL + SV or a simple GL input files. See Figure 3 for a visual screenshot of the lobby.

Data upload and statistical design specification
Within the appropriate interface according to the data type, the first tab available corresponds to the Input data section.
The user has to upload their own data in one of the different accepted formats and types (see them on Data requirement section) in their dedicated spaces (see Figure 4A). For all input data, a field to specify the custom dataset to use as a background universe is available as well (see Figure 4B). If necessary for the user, some demo data representing the four different input types, can be found on DEVEA's GitHub https://github.com/MiriamRiquelmeP/DEVEA/tree/main/data. The nature of the demo data and how it was generated are described in the Use case section.
When a CM + SI or a DO is used as input data, it is important to indicate the statistical design or contrast for the expected comparison. The design formula expresses the variables that will be used to calculate the differential expression in following steps. For the CM + SI input format, the levels of interest that will compose the final design must be included in one of the columns of SI file. By entering the column name in the 'Select column for contrast' field, the program extracts the conditions and calculates the combination based on the distinct levels (i.e. Treatment_Control_vs_Treatment if column Treatment is selected or Sex_Male_vs_Female will be displayed if Sex is selected from SI in Figure 2 -CM + SI). In cases where more than two levels are available, the application will propose all one-vs-one combinations. Then, the relevant combination for the analysis has to be selected by the user in the 'Select design' part. In a DO, the user can use the function relevel() from DESeq2 in R to specify the basal level.
With a DO data type, the column for the design must have been already incorporated when generating the DESeq2 object in R. Only simple designs will be generated and/or can be selected from 'Select design' field (i.e. Sex_Male_vs_Female if design = Sex is specified in the formula as in Figure 2 -DO). Several conditions to be treated can be included in the design of DESeq2 within the same DO. DEVEA will offer the possibility to consider any of them for each analysis, and the user can select which contrast to explore from the same DO.
It is possible to model batch effects in DEVEA with the DO object. In that case, the user can include the batch effect directly in the design formula (i.e. design =~Batch + Condition). Once the final contrast is specified as Condition_level1_vs_level2, the batch effect will be accounted for in the statistical model. In that case, the batch effect is treated in DEVEA as a covariate in the regression model.
The user can also remove the batch effect before uploading the DO object into DEVEA, for instance by using the removeBatchEffect() function of the Limma-Voom R package, 16 or other packages such as ComBat-seq. 21 Differential expression analysis (DEA) and data view The first key performance of the application consists in extracting the descriptive information based on the feature expression and the statistical contrast for DEA. In CM + SI data type a new DeseqDataSet object is calculated from the files and information provided by the user. In DO or GL + SI input modes, the application retrieves the important values already included in these objects. All transformations, normalization and measurements applied to the data at this step are performed with functions included in DESeq2 R package. It should be noted that DEA calculation is not possible with the simple GL input mode, due to the lack of expression values and statistical details. The following comments will not apply to this object.
The calculations and the statistical results are accessible in the 'Preview dataset' section tab. At this step of the analysis, the user can explore the number of features considered as differentially expressed and their direction, and establish the descriptive statistics thresholds to consider them DEGs. By default, DEVEA uses prefixed log2 fold change |lfc| > 0.5 and adjusted p-value < 0.05 thresholds, that can be adapted by the user at any moment and thus modulate the list of DEGs. Moreover, the information uploaded and the descriptive statistics will be used to establish and control some interactive parts of the plots. For example, the color of defined up-regulated or down-regulated genes can be chosen ( Figure 5A). For CM + SI and DO, a complete table of results, named 'Statistical -Expression values', is displayed showing useful information such as base means across samples, log2 fold changes, standard errors, raw and adjusted p-values for the specific design selected. A second table is also shown with the DEA details called 'Samples information -Coldata' ( Figure 5B). For the GL + SV mode, raw data are displayed in a table. The user can monitor gene name conversion, explore values interactively and sort, filter and download them at any time.
It is important to stress that, as different elements can be extracted from the distinct input objects depending on their complexity, the number of graphs available for each of them vary (Figures 1 and 6). Using the raw expression values that are available only in CM + SI and DO input modes, the user can explore data in a Principal Component Analysis plot (PCA) with the top 500 variant features, to show clusters of samples based on their similarity selecting the principal components of interest; a box or violin plot for gene expression distribution across the dataset; a heatmap representation

Enrichment analysis (EA) and visualization
The last stage of the analysis with DEVEA is EA. This is a method to identify classes of defined categories that are overrepresented in the list of DEGs. These categories may be associated with disease phenotypes, biological pathways, or cellular functions. DEVEA uses the differentially expressed and significant features to retrieve the over-represented terms from several well-known databases. This major block of the DEVEA analysis can be carried out from all data input types. It consists of an extensive EA after the selection of appropriate statistical values for defining DEGs from CM + SI, DO and GL + SV, or using all components included in the simple GL input. It collects significant terms from KEGG (Kyoto Encyclopedia of Genes and Genomes 23 ) and GO (Gene Ontology 24 ) Biological process, Molecular function and Cellular component databases. Furthermore, a GSEA (Gene Set Enrichment Analysis 25 ) and leading edge exploration analysis can be performed from different databases for the whole set of features. In the case of CM + SI, DO and GL + SV input modes, KEGG and GO analyses are performed for all DEGs together, and for the subset of up-and down-regulated genes in separated tabs. GSEA is always performed on the complete set of genes and uses the statistical values associated with the features. With the simple GL data input, enrichment can only be performed for the whole set of genes for KEGG and GO analysis and no GSEA will be possible, due to the lack of statistical information.
The main results of EA are shown as interactive tables containing detailed information on the enrichment from each database. In KEGG and GO categories, the tables display columns for the name of the significant pathways or terms, their adjusted p-values and additional descriptive information such as total number of genes associated or the DEGs participating in the pathway. The user can also display the gene names that match in the pathway from the "+" symbol. Below each table, additional plots can be created by selecting rows of interest (showed in green on the Figure 9A). The plots are interactive and reactive. They can be changed at any time by selecting new lines in the  Figure 9B). Leading edge results have also been implemented in the plot when one unique pathway is displayed, as a red line indicating the extent of what we consider the leading edge genes.
As a special feature offered by DEVEA, a custom gene list can be uploaded to be used as the background gene universe for EA ( Figure 4B). For example, the user can use a background list containing only expressed genes in this experiment. It will control for experimental context and enable representative functions, rather than only the functions explaining the nature of the samples, to be identified. This is especially important to consider in microarray studies when a limited number of probes is used, or in studies with specific cells or tissues showing a restricted set of detectable genes.
For this EA, internal ENTREZ ID gene code is used to associate gene names in Symbol or ENSEMBL with the enrichment annotation in KEGG, GO and GSEA libraries. An exhaustive conversion across different functions is conducted to retrieve all possible terms, since they are not homogenously registered in all conversion databases. Furthermore, not all genes are curated or annotated and therefore some will be left out of the EA (the portion of genes is indicated in the application below the 'preview table' as shown in Figure 5B). This is a limitation from the automated databases conversions without manual curation. For this reason, the number of species is limited and not all genes will be used for the EA, but the results obtained are more robust.

Global report
An interactive HTML report can be generated from all data types following analysis and exploration with DEVEA. It is available in the 'HTML report' button at the top fixed part of the application. To create it, the user can select the individual set of figures, tables and results to be kept in this single HTML document ( Figure 10A). Plots will retain the last aesthetic indicated in the graphical parameters (e.g. colors, shapes, labels, terms). Only full tables will be kept, without taking into account potential filters applied during the analysis, allowing full exploration, sorting and re-filtering of the whole dataset outside of DEVEA. It should be noted that the majority of the results can also be copied or downloaded in high-quality format at any step of the analysis within DEVEA. Importantly, comments can be included at any step of the analysis in a dedicated section at the top right position of the application, and will be automatically saved ( Figure 10B). They can be displayed in the final report to ensure that the observations made throughout the analysis, with special interpretations or results are maintained.
Please note that if any of the graphs or tables fails to be generated in the application, when there are not enough results, genes or pathways to display them, the report cannot be generated. Unselect the problematic plots or tables to be able to render the rest of the report without errors.

Use case
To demonstrate the usefulness of DEVEA, an RNA-seq dataset generated by our team and published recently 26 was investigated using the application. The study aimed to characterize the role of JAK2-STAT3 signaling in astrocytes in the context of Huntington's disease (HD). HD is a rare genetic neurodegenerative disease leading to severe motor, cognitive and psychiatric symptoms, with no curative treatment available. 27 Astrocytes, a heterogeneous group of star-shaped glial cells, perform key functions in the brain. They provide nutrients to neurons, regulate synaptic transmission, and contribute to brain repair following injury. 28 Astrocytes become reactive in the brain of HD patients and their impact on HD progression is still unclear. 29 The study used a genetic mouse model of HD. Treated mice were injected with an adeno-associated viral (AAV) vector targeting astrocytes and encoding a constitutive form of the JAK2 kinase (JAK2T875N) to activate the JAK2-STAT3 pathway. Control mice were injected with a similar AAV expressing GFP. Astrocytes were isolated and sequenced by RNA-seq on a HiSeq 2500 Illumina platform (2 Â 100 bp). Quality control of sequencing data was performed with FastQC 30 (v0.11.9). Reads were mapped on the GRCm38 (mm10) mouse genome assembly with Hisat2 31 (v2.2.1), and a counting matrix was generated. Quantification of reads associated with genes was achieved with featureCounts 32 (v2.0.0), and differential gene expression analysis was performed with DESeq2 (v1.28.1) Bioconductor (v3.13) package on R (v4.0.2). Only genes with a raw number of counts ≥ 10, in at least 3 samples were analyzed. Data were adapted and integrated as different input objects in DEVEA to test all functionalities.
For instance, Figure 7A shows that control (GFP, N = 5, in green) and treated (JAK, N = 6, in red) samples are clearly separated on a PCA plot, with a better separation achieved on PC2 (representing 27% of the total variance). Figure 7C-D also show two types of clustering profiles, which group samples from the same group together and display genes with higher variability. Figure 11A shows that the levels of Jak2 are higher in treated (JAK) versus control (GFP) groups, as expected by AAV-mediated gene transfer (log2(FC) = 6, associated adjusted p-value 8.7E-69). In addition, a volcano plot demonstrates that JAK2 causes down-regulation of many genes, shown in blue in the upper left quartile of the graph on Figure 11B. Finally, EA in the treated (JAK) versus control (GFP) list of DEGs shows that many GO-BP terms are related to Immunity/Inflammation, a process linked to the reactive changes in astrocytes induced by JAK2 ( Figure 11C).
All data corresponding to this research project are available under the four different DEVEA input types. They are available on the GitHub web site (see the Software Availability section). They consist in (1)  With this user case, no errors were detected throughout the analysis. DEVEA, thanks to a large range of graphical and statistical analyses, highlighted significant differences between reactive astrocytes in the JAK group and control astrocytes in the GFP group. Due to extensive EA, DEGs were associated with important biological functions such as immunity/inflammation pathways as well as cytokine signaling and proteostasis. These results are consistent with the conclusion described in the corresponding publication. 26 We have also generated demo files corresponding to an RNA-seq study of Arabidopsis thaliana from a recent publication. The data represents the differential expression analysis of 3 mutant fibrillins (FBN6) samples vs 3 controls. See the dedicated paper for further details. 33 Apart from these demo data, to facilitate the preparation of some input data for biologists who lack bioinformatics skills, we have provided on our GitHub two standard tutorials using the Galaxy platform. 34 The tutorials allow a user with no prior bioinformatics knowledge to generate read counts from raw RNA-seq .fastq data in two easily accessible stepby-step methods. Quality control and a few downstream analyses similar to those done within DEVEA can be explored following this pipeline. The tutorials can be accessed at https://github.com/MiriamRiquelmeP/DEVEA/blob/main/ Galaxy_tutorials.md.

Related works (state-of-the-art)
The field of application tools for transcriptomics data visualization, DEA and EA is constantly growing. DEVEA functionalities are compared with six similar applications recently published (iDEP, 5 ShinyGO, 35 DEGenR, 36 GENAVi, 6 RNfuzzyApp 37 and ideal 7 ). These tools operate through a graphical user interface, provide interactive results, and are based on stable and maintained R packages. A detailed comparison of the DEVEA main attributes is shown in Table 1. Characteristics that are related to data management and importation of data into the application have been stressed in the upper part of the table, followed by the various modes of DEG identification and different interactive graphical results, EA calculations and global reporting and webhosting. It is clear from Table 1 that most of the selected applications share many functionalities with DEVEA. One exception is the application ShinyGO, which offers significantly fewer options since it is designed only to perform enrichment calculations from simple gene lists. However, some other specific differences exist with other tools. DEVEA has more flexibility in terms of data type import in different formats, representing different stages of the analysis. Similarly, the user has a wide range of exploration possibilities via GL and GL + SV input formats, in terms of data generation and origin. The list of features and the associated statistics may have been generated from many different external tools or could represent several analysis types, as long as they are eventually converted into gene names (i.e. Microarray data results, proteomics results, gene lists from the literature, etc.). As a further advantage, the ability to import complex objects, such as DO, increases the number of visualization and analysis options. Despite this, not all possible visuals that can be generated from these objects are included in DEVEA, which has room for improvement. In particular, DEVEA could further develop data management functionalities, by extending the capacity of dealing with batch effects or data pre-processing and filtering. One of the possible drawbacks is the low number of available species for the EA compared with other tools or the potential mismatches when converting gene names. The EA, where there is information about the level of expression, can be generated from all DEGs, either split by the direction of expression change (only on up-or down-regulated genes), or merged into a single list. This is not always the case in applications that perform analyses on the whole list of potential genes of interest. The custom global report, a unique feature of DEVEA, might be very handy to share results with collaborators, because the user can easily insert comments and transfer the fully-interactive HTML report. Lastly, DEVEA appears slightly more flexible in terms of application hosting and running, since it is possible to run it online (DEVEA web server), or offline using R. For instance, certain applications such as DEGenR and RNfuzzyApp do not offer the possibility to run the application online.
Although overall applications share common analysis blocks, DEVEA presents more graphical variety than most of them. For example, as a criterion to consider in the table that an application contains "Interactive preview visuals" to preliminary explore the data, only one of PCA plot, violin/box plot of sample profile, heatmap for gene expression, sample clustering, gene expression dots plot per groups should be generated from the applications. For "Interactive DE visuals" of the DEA statistics and profile the criterion consist of including at least one of volcano plot, MA plot or karyoplot. Finally, for the "Interactive visuals" to navigate the EA, the marked tools include at least one plot among bars plot, dots plot, chord plot, heatmap, net plot, word cloud, circle plot of the enriched terms. For most applications, only a small subset of these plots are implemented. DEVEA contains all these graphs, requirement that none of the others met.

Conclusion and Future Perspectives
The DEVEA application is developed to improve the set of existing software to perform DEA, data visualization and annotation or EA from transcriptomics data. DEVEA meets the need for applications that give sufficient usage autonomy, without compromising the complexity and accuracy of the results. It provides an interactive and user-friendly interface accessible to users without bioinformatics training, with a high diversity of analysis components. Researchers can explore their data in real-time, carry out DEA and subsequent EA from distinct well-known databases without losing possible customization. DEVEA contains a large range of functions in a single tool, avoiding the use of different tools/websites to perform transcriptomics analysis, offering advanced ready-to-publish visuals, tables and results. One of the main strength is the incorporation of several input data types. Additionally, the use of robust R packages, especially the DEA DESeq2 package is an attractive functionality of the application. The possibility to include a custom background makes DEVEA better suited for analysis in which correction of some experimental bias could lead to better results in the EA, avoiding the inappropriate inflation of statistical p-values and false-positive results. Another key advantage is that DEVEA allows the user to extract their results individually or in an interactive format through a custom HTML format file. To further develop DEVEA analyses, we plan to offer additional pre-treatment options (e.g. removing batch effect, filtering genes by expression, etc.) and integrate more species and include transcription factor enrichment analysis. In conclusion, the purpose of DEVEA is to promote the dialogue between biologists and bioinformaticians, particularly to produce suitable data and to understand the validity of the data needed to create the best downstream results.
• Test files for every input mode can be found also on https://github.com/MiriamRiquelmeP/DEVEA/tree/main/ data.

Wenbin Guo
Information and Computational Sciences, James Hutton Institute, Dundee, UK In response to the feedback provided in the last review, the authors have carefully reviewed and revised their manuscript, taking into account all the comments and suggestions made. They have added new information where necessary to further support their claims. The authors have made sure that their arguments and findings are presented clearly and logically. Overall, the authors have shown great attention to detail and have made improvements to their manuscript.

Wenbin Guo
Information and Computational Sciences, James Hutton Institute, Dundee, UK

Summary
The author developed a shiny App DEVEA for biologists who lack extensive bioinformatics skills to perform complex differential expression analysis and downstream function annotation and pathway enrichment analysis. It provides an easy-to-use platform and empowers biologists to hand on complex RNA-seq analysis by themselves. The multiple options of inputs, including (a) the raw read counts of genes, (b) the intermediate dataset produced by DESeq2, (c) the gene list with statistics of significance and (d) the gene list only, allowed users to start the analysis flexibly from different types of data. It also provided interactive figures and tables to visualise the results in real-time. In the end, a report can be generated with user-selected results and figures, which improved the reproducibility of the RNA-seq analysis.
Below are some limitations of the current DEVEA and suggestions for future improvements. The author already had a plan to address some of them in future directions.
It seems that the current version of DEVEA only supports the analysis of human, mouse and rat, which limits its usage in wider studies of plant species and other animal species.
○ Due to the above limitation, the current manuscript lacks some case studies in other species, especially in plants.

○
The controls for errors caused by unwanted variations are not proposed in the DEVEA pipeline. For example, RNA-seq data often have batch effects between replicates and they will largely hinder the accuracy of differential expression analysis. DESeq2 also provides suggestions on how to correct the batch effects in the user manual.
○ DEVEA only allows a pair-wise comparison between control and treatment at a time, which is not flexible for complex experimental design, such as datasets with a great number of conditions to compare, datasets of time-series data and development series, and datasets with multiple groups and each group include addition level of conditions to compare. ○ It would be impressive to enable enrichment analysis for the genes lacking annotations. For example, getting the gene annotation by blasting the sequence against some databases. It will be especially useful for the analysis of newly assembled genes and transcripts.

Comments for manuscript improvement:
The preparation of some input data is still difficult for biologists who lack bioinformatics skills. For example, getting the raw read counts requires bioinformatics skills to map the RNA-seq data to a reference genome or transcriptome and generate read counts by using quantification tools such as Salmon, Kallisto and RSEM. The author should propose an easyto-access method and provide step-by-step tutorials for biologists to generate the read counts from raw RNA-seq data, for example, using the Galaxy platform.

○
There are massive methods available for the function of differential expression (DE) analysis. Many comparison studies showed that the methods DESeq2, Limma-voom, edgeR and Sleuth have similar performance in DE analysis and each of these methods has a big ○ user group. To attract more users to DEVEA, the author could highlight the advantages to use DESeq2 in DEVEA compared with other methods, such as Limma and edgeR, in the introduction or discussion.
In the tutorial, the table of contents is floating and overlaps with texts and videos. Is it possible to reallocate the position of the table of contents to avoid overlapping? ○ In the example data link: https://github.com/MiriamRiquelmeP/DEVEA/tree/main/data, it would be better to include a readme file with descriptions of the example datasets.
○ "Supplemental-Information.txt: Running DEVEA locally on a machine": I would recommend the author build DEVEA as an R package. Then, instead of "Download the compressed folder from the DEVEA GitHub repository", users can install and run DEVEA on a local machine as an R package. For example, RNA-seq data often have batch effects between replicates and they will largely hinder the accuracy of differential expression analysis. DESeq2 also provides suggestions on how to correct the batch effects in the user manual. + We can distinguish between modeling the batch effect using DESeq2 within DEVEA and removing the batch effect before importing the data into DEVEA. An advantage of using as input data a DO generated previously in R, is that the user can include Batch in the design formula as "design = ~Batch + Condition". Once the final contrast is specified as Condition_level1_vs_level2, the batch effect will then be 'accounted for' (the statistical inferences will be adjusted for Batch) with a differential expression analysis testing for Condition. Batch is essentially treated in DEVEA as a covariate in the regression model, just as we do in association studies. If the user wants to effectively remove the batch effect for downstream analyses, it can be performed outside DEVEA on the variance-stabilized or rlog expression values using limma::removeBatchEffect() or ComBat-seq tool, before uploading the data to the application again. This could be considered for future versions of DEVEA as well. We have added a section in the 'Data upload and statistical design specification' section of the manuscript.
-DEVEA only allows a pair-wise comparison between control and treatment at a time, which is not flexible for complex experimental design, such as datasets with a great number of conditions to compare, datasets of time-series data and development series, and datasets with multiple groups and each group include addition level of conditions to compare. + Multiple experimental conditions can be included in the design of the DESeq2 object (DO). DEVEA will offer the possibility to consider any of them foreach analysis. For instance, when "design = ~ConditionA + ConditionB + ConditionC", the tool will propose as possible contrast 1.ConditionA_level1_vs_level2, 2.ConditionB_level1_vs_level2 and 3.ConditionC_level1_vs_level2. The user can select from DEVEA interface, which specific contrast to explore. In addition, for DO and CM + SI, when there are more than two levels from the same condition, DEVEA will propose all the one-vs-one combinations. Therefore, if the "design = ~ ConditionWith4levels", the possible contrasts would be 1.Condition_level1_vs_level4, 2.Condition_level2_vs_level4, and 3.Condition_level3_vs_level4, level4 being the basal condition for every contrast. For CM + SI, these levels are selected alphabetically. In a DO, the user can use the function relevel() from DESeq2 in R to specify the basal level. We have changed the text in the manuscript to indicate these possibilities, in the 'Data upload and statistical design specification' from the 'Implementation' section.
-It would be impressive to enable enrichment analysis for the genes lacking annotations. For example, getting the gene annotation by blasting the sequence against some databases. It will be especially useful for the analysis of newly assembled genes and transcripts. + We agree that such an initiative would provide a genuinely useful service, especially for recently sequenced organisms who do not have yet a good and comprehensive annotation. However, this task is far from being trivial and requires a substantial amount of software development and is therefore currently beyond the scope of the current DEVEA implementation. For DEVEA, we currently aim to exploit available packages and resources for visualization, differential expression analysis and enrichment from well-known and described species. -Page 11: "Reads were aligned on the mouse genome (ENSEMBL GRCm38 release 96) and a count matrix was generated." For reproducibility, the tool for read mapping and count matrix generation should be referred to and cited. + The software tools we used for this task and further details are now indicated in the main text.
- Figure 7: the font size of labels and some details of the figures are too small. Would it be better to change the multiple plot layout to 3 rows x 2 columns? + We have tried to reformat the figure layout to 3 rows x 2 columns, but the result was really not satisfying, so we preferred to keep the original layout.
Could the authors specify whether the software support both "." and "," as decimal separators; ○ When uploading the GL with gene symbols, these latter are automatically converted to Ensembl identifiers, but some symbol are not recognized. It is unclear which genes are considered for EA analyses: all genes with symbol, only genes with an Ensembl identifier, only annotated genes with an Ensemble identifier? ○ In some of the EA graphs, deregulated pathways/gene sets are referred to by their url rather than by the pathway name. This complicates the reading of the graphs; ○ Demo data are downloadable, but the link given in the manuscript does not work; ○ Downloads of the SVG images don't seem to work, except for Chordplot in KEEG Enrichment for which the download is different. However, in this latter case, only the plot (but not the legend) is downloaded;

○
The inclusion of GSEA type results in the analysis is interesting. Including the GSEA leading edge analysis in DEVEA would be an interesting add.

○
Further improvements that would be interesting to make: Enrichment analysis: In addition to GO and KEEG analyses, it would also be interesting to add other databases, such as Wiki Pathways for example. ○ It will also be interesting to enable evaluating over-representation for custom genesets (i.e. specific molecular signatures of interest in the research field).
○ Remarks: Assessment as to whether sufficient details of the code, methods and analysis is provided to allow replication of the software development and its use by others, is beyond the competence of the reviewer.

Is the description of the software tool technically sound? Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others? Yes

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes + We agree that the inclusion of the leading edge for GSEA results is interesting. This has been included in DEVEA as an extra column in the table at the GSEA tab, called "leading_edge". It has also been implemented in the plot as a red line indicating the extent of what we consider the leading edge genes. This is also mentioned in the text 'Enrichment analysis (EA) and visualization' section.
-Further improvements that would be interesting to make : Enrichment analysis: In addition to GO and KEEG analyses, it would also be interesting to add other databases, such as Wiki Pathways for example. ○ It will also be interesting to enable evaluating over-representation for custom genesets (i.e. specific molecular signatures of interest in the research field). ○ + Regarding the first further improvement, adding more databases would be a significant extension to the current version of DEVEA, that we want to keep fast and operational. For the second suggestion, the specific custom genesets needed for the over-representation analysis could be very different between users and experiments, hence we prefer to only include the curated well known genesets from the databases. Nevertheless, DEVEA allows the definition of background genes for comparison in the EA part, avoiding the inappropriate inflation of statistical p-values and false-positive results from contrast against the whole genome/proteome. For the moment, we aim at offering a stable and robust version and creating a docker container with these basic functionalities efficiently working in the server. We think that including these significant improvements is beyond the scope of this first DEVEA version. However, these are exciting suggestions to be considered in the following versions. Our goal is to further develop DEVEA according to the needs of the community and bio-informatics developments. This is acknowledged in the 'Conclusion and Future Perspectives' section, and we will consider including new functionalities in a future version of DEVEA.

Competing Interests:
No competing interests were disclosed.